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© Operating system based performance monitoring of programs. 

© An operating provides a facility within its kernel 
for monitoring program performance. The facility 
may monitor user level programs as well as portions 
of the operating system, such as the kernel. The 
facility counts instructions and/or function calls to 
provide a useful performance metric to a user of the 
system. The count is forwarded to a user level 
monitoring program. The inclusion of the facility 
within the kernel enhances the speed of performance 
monitoring and enables the operating system to be 
directly monitored by the facility. 
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Technical Held 

The present invention relates generally to data 
processing systems and, more particularly, to per- 
formance monitoring of programs run on data pro- 
cessing systems. 

Background of the Invention 

A number of different statistical techniques 
have been developed for monitoring performance 
of a program run on a data processing system. 
One of the most prominent techniques is to monitor 
the time spent in executing respective functions of 
the program. One difficulty with this approach is 
that in many instances the granularity of the clock 
used in timing the functions is too large to provide 
an accurate picture of how processing time is dis- 
tributed amongst the respective functions. Instruc- 
tion counting is another statistical technique for 
monitoring performance of a program. Instruction 
counting techniques have generally been limited to 
monitoring the performance of user level programs 
(as opposed to system level programs), such as 
application programs. These techniques have not 
been applicable to monitoring the performance of 
operating system kernels. As a result of these 
limitations, statistical techniques have often not 
been useful to a programmer to help him enhance 
program performance. 

Summary of the Invention 

A method is practiced in a data processing 
system having a processor for executing instruc- 
tions and a storage mechanism for storing an op- 
erating system and a user level monitoring pro- 
gram. The processor executes instructions at a 
user level and a system level. The data processing 
system may be a distributed system. In this meth- 
od, a facility is provided in the kernel of the operat- 
ing system to monitor program performance. For 
instance, the facility may count the instructions 
executed by the processor, count the number of 
times that a section of code has been called during 
execution of a program or provide a number of 
separate counts categorized by function that speci- 
fy the number of instructions executed in the re- 
spective functions. A section of code of a program 
is executed on the processor. The facility in the 
kernel of the operating system is then used to 
monitor system performance. The type of facility 
used determines the values that are counted by the 
facility. The counts are reported to the user level 
monitoring program. For example, the facility may 
count the number of instructions when executing a 
section of code of the program or may count the 
number of times that a section of code of the 



program is called or the facility may provide sepa- 
rate counts categorized by function with instruc- 
tions executed in the functions when a first function 
is called during execution of the program. The 
5 program may be a user level program or part of 
the kernel of the operating system. 

Brief Description of the Drawings 

w Figure 1 is a block diagram of a distributed 

system for practicing a preferred embodiment of 
the present invention. 

Figure 2 is a block diagram showing a com- 
puter system of Figure 1 in more detail. 

75 Figure 3 is a block diagram showing an API of 

Figure 2 in more detail. 

Figure 4 is a flowchart illustrating the steps 
performed from a user's perspective in monitoring 
program performance and tuning the program in 

20 accordance with the preferred embodiment of the 
present invention. 

Figure 5 is a flowchart illustrating in more detail 
how breakpoints are inserted and how a program is 
run when monitored by the preferred embodiment 

25 of the present invention. 

Detailed Description of the Invention 

A preferred embodiment of the present inven- 

30 tion provides a facility within the kernel of an op- 
erating system to monitor performance of pro- 
grams. The facility provides a user with a number 
of options. First, The facility allows a user to count 
the number of instructions executed by a function 

35 each time that the function is called. Second, the 
facility allows the user to count the number and 
frequency of calls to a particular function without 
counting the instructions executed in the function, 
and third, the facility allows the user to count the 

40 number of instructions executed in each function 
that is called as a result of calling an initial function. 
Since the facility is provided in the kernel of the 
operating system, it may be used not only to 
monitor the performance of user level programs, 

45 such as application programs, but also may be 
used to monitor the performance of the kernel and 
other portions of the operating system. Providing 
the facility within the kernel of the operating system 
also enhances the speed of monitoring and pro- 

50 vides a nonintrusive approach to performance mon- 
itoring that does not require any modifications to 
the program being monitored. 

Trie preferred embodiment of the present in- 
vention may be used in a distributed system 10 

55 like that shown in Figure 1. It should also be 
appreciated that the present invention may be 
practiced in single processor system. Hie distrib- 
uted system includes a number of computer sys- 
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terns 12 that communicate through a network 14. 
Each of the computer systems 10 may concur- 
rently run a trace on the program it is executing. 
The network may be any of a number of different 
types of networks, including local area networks 
(LANs) or wide area networks. Those skilled in the 
art will appreciate that the distributed system 10 
may include a different number of computer sys- 
tems than shown in Figure 1 . The depiction of four 
computer systems in the distributed system is 
merely illustrative. 

Figure 2 is a more detailed block diagram of 
the components contained in one of the computer 
systems 12. Each of the computer systems need 
not have this configuration, but this configuration is 
intended to be merely illustrative. The computer 
system 12 includes a central processing unit (CPU) 
13 and a memory 16. The CPU has a number of 
registers 17, including a FLAGS register, and a 55 
(Stack Segment) register. These registers will be 
described in more detail below. The memory 16 
holds a copy of a distributed operating system 28 
such as the Microsoft NT Operating System, sold 
by Microsoft Corporation of Redmond, Washington. 
In the preferred embodiment of the present inven- 
tion, each of the computer systems 12 has a copy 
of the operating system that it runs. The operating 
system 28 includes a kernel 30, which, in turn, 
includes a section of code 32 that provides support 
for performance monitoring. The section of code 32 
will be described in more detail below. The mem- 
ory 16 may also hold at least one application 
program 26 that may be run on the CPU 13. The 
application program 26, however, need not reside 
in memory 16. 

The computer system 12 also includes a num- 
ber of peripheral devices. These peripheral devices 
include a secondary storage device 18, such as a 
magnetic disc drive, a keyboard 20, a video display 
22 and a mouse 24. 

Figure 3 is a block diagram showing the sec- 
tion of code 32 in more detail. The API includes a 
breakpoint facility 34 for supporting the generation 
of breakpoints within a program run on the CPU 13. 
The section of code 32 also includes a single step 
interrupt handler 36 for handling single step inter- 
rupts. In order to explain the role served by the 
breakpoint facility 34 and the single step interrupt 
handler 36, it is helpful first to review what break- 
points are and what single step interrupts are. 

A single step interrupt is an interrupt that is 
used when a program is single-stepping. Single- 
stepping refers to a mode of operation of the CPU 
13 in which one instruction is executed at a time 
under the control of a supervisory program. Single- 
stepping provides a mechanism for debuggers and 
performance monitors to execute a program slowly 
to carefully monitor execution of the program. In 



the present instance, the single step handler 36 is 
the supervisory program under which a program 
being monitored executes one instruction at a time. 
In general, an instruction is executed and then a 

5 single step interrupt is generated to switch control 
of the processor to the supervisory program. Many 
microprocessors, such as the 80X86 microproces- 
sors, provide a convenient mechanism for switch- 
ing the microprocessor into single-stepping mode. 

to For the 80X86 microprocessors, a single step trap 
flag is provided in the FLAGS register (see Figure 
2). When the single step trap flag has a value of 
one, the processor executes in single step mode 
and, conversely, when the single step trap flag has 

75 a value of zero, the microprocessor does not op- 
erate in single-stepping mode. 

A breakpoint refers to a breakpoint interrupt 
that is used to break the uninhibited execution of a 
program and is helpful in debugging and perfor- 

20 mance monitoring programs. The difficulty with ex- 
ecuting an entire program in single-stepping mode 
is that it is very slow. In many instances, the 
majority of a program may already be debugged or 
tuned and only a small portion needs to be exam- 

25 ined in more detail. A breakpoint interrupt is espe- 
cially suitable in such circumstances. A breakpoint 
interrupt is generated by inserting a special opcode 
into a program. This opcode, when executed, 
causes a breakpoint interrupt to be generated. 

30 Upon generation of the breakpoint interrupt, control 
is transferred to the breakpoint facility 34, which 
may then perform further actions as needed. The 
role of breakpoints and single-step interrupts in the 
preferred embodiment of the present invention will 

35 be described in more detail below with regard to 
the discussion of Figures 4 and 5. 

In addition to the breakpoint facility 34 and the 
single step handler 36, the section of code 32 also 
includes code relating to context switching 38. This 

40 code 38 provides a hook for context switches and 
includes a handler that looks at the context switch 
and determines whether the new context should be 
single-stepped or not. If the new context is to be 
single-stepped, the handler makes certain that sin- 

45 gle-stepping occurs. 

The section of code 32 also includes counter 
values 40. As was discussed above, the preferred 
embodiment of the present invention provides a 
user with an opportunity to monitor the number of 

so instructions executed in the function, the number of 
times a function is called within a program and the 
number of instructions executed for each function 
that is called as a result of calling an initial function. 
Each of these counts is stored in a separate coun- 

55 ter, and the values of the counters are held as 
counter values 40 in the data area of the section of 
code 32. 
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Figure 4 is a flow chart of the steps performed 
to monitor the performance of a program or a 
portion of a program within the preferred embodi- 
ment of the present invention. Initially, the user 
selects a performance metric (step 42). A user 
interface, such as a menu of dialog box, is pro- 
vided on the video display 22 (Figure 2) of the 
computer system 12 (Figure 1) that he is using, to 
enable the user to select the desired performance 
metric. As was discussed above, there are a num- 
ber of different performance metrics that may be 
calculated by the preferred embodiment of the 
present invention. The user also chooses what 
functions in the program are to be monitored (step 
44) . Once the performance metrics have been 
selected and the functions to be monitored have 
been selected, opcodes for generating breakpoint 
interrupts are inserted into each of the functions 
that is to be monitored, and the program or portion 
of the program is executed (step 46). The appro- 
priate counter values 40 are calculated as the pro- 
gram is run. The counter values 40 are displayed 
on the video display 22 as the program executes 
(step 48 in Figure 4). When the program or portion 
of a program is done executing, the user may 
examine the performance data to guide him in 
appropriately tuning the program (step 50) . How 
the data is used to tune the program depends upon 
the type of data that is gathered. Those skilled in 
the art will appreciate how to use the performance 
metrics data to appropriately tune the program. 

Figure 5 is a flow chart showing the steps 
performed on a function-by-function basis in insert- 
ing a breakpoint and executing the program (i.e., 
step 46 in Figure 4) in more detail. As mentioned 
above, a breakpoint opcode is placed at the first 
instruction of each function to be monitored (step 
52 in Figure 5). The system then begins execution 
of the function (step 54). Since the breakpoint is 
the first instruction in the function, the kernel 
catches the breakpoint (step 56) and then deter- 
mines what performance metric the user has re- 
quested for the function (step 58). If the only per- 
formance metric requested is a count of the num- 
ber of calls to the function, a counter in the counter 
values 40 (Figure 3), that tracks the number of calls 
to the function, is incremented to indicate that the 
function has been called (step 60). Once the coun- 
ter has been incremented, the function is restarted 
and resumes execution at full speed. 

Before considering the steps that are per- 
formed in the preferred embodiment of the present 
invention when the user wishes to monitor the 
number of instructions executed, it is first helpful to 
introduce the notion of a "thread" and a "stack." A 
thread represents an executable portion of a task 
that can be run independently and concurrently 
with other threads. The operating system 28 is a 



multithreading system. Those skilled in the art will 
appreciate that the present invention need not be 
practiced in a multithreaded system. It may also be 
practiced in single process and multiprocessing 

5 systems. A multithreading operating system is able 
to execute multiple threads concurrently. The stack 
is a last-in-first-out (LIFO) data structure that is 
used to hold data and register information for the 
thread. Each thread has a portion of the stack 

io segment dedicated to it. The SS (Stack Segment) 
register (Figure 2) points to a segment in memory 
that holds stacks for the threads of the program 
being executed. Each thread has a stack pointer 
(SP) value associated with it that points to the top 

75 of its stack (i.e., the portion of the stack segment 
dedicated to the thread). Those skilled in the art 
will appreciate that the present invention may also 
be practiced in environments that do not use 
stacks. 

20 When in step 58 of Figure 5 it is determined 
that the user wishes to count a number of instruc- 
tions executed rather than counting the number of 
function calls, the stack pointer for the currently 
executing thread is recorded in memory 16 (step 

25 64). The single step trap flag in the FLAGS register 
(Figure 2) is then set to cause the CPU 13 to enter 
single-stepping mode (step 66). In single-stepping 
mode, each time that an instruction is executed, 
the counter value that keeps track of the number of 

30 instructions executed for the current function is 
incremented (step 68). The instructions are count- 
ed until the thread exits the function being mon- 
itored or the thread terminates. The instruction 
counting may also be suspended when a thread 

35 deschedules itself, but the instruction counting re- 
sumes after the thread is again scheduled. 

While the present invention has been de- 
scribed with reference to a preferred embodiment 
thereof, those skilled in the art will appreciate that 

40 various changes in form and detail may be made 
without departing from the present invention as 
defined in the appended claims. 

Claims 

45 

1. In a data processing system having a proces- 
sor that runs at a system level or a user level 
when executing instructions and a storage 
mechanism for storing an operating system 
so that includes a kernel, a method comprising 
the steps of: 

providing a facility in the kernel of the 
operating system for counting instructions ex- 
ecuted by the processor and a user level mon- 
55 itoring program for monitoring the instructions 

counted by the facility; 

executing a section of code on the proces- 
sor; 
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using the facility in the kernel of the op- 
erating system to count the number of instruc- 
tions executed when executing the section of 
code on the processor; and 

reporting the count of the number of 
instructions executed to the user level monitor- 
ing program. 

2. The method recited in claim 1 wherein the 
section of code that is executed on the proces- 
sor is part of the kernel of the operating sys- 
tem. 

3. TTie method recited in claim 1 wherein the 
section of code that is executed on the proces- 
sor is part of a user level program. 

4. The method recited in claim 1 wherein the 
section of code that is executed on the proces- 
sor is a non-kernel portion of the operating 
system. 

5. The method recited in claim 1, further compris- 
ing the step of storing the count of the number 
of instructions executed when executing the 
section of code on the processor in the storage 
mechanism. 

6. The method recited in claim 1 wherein the 
section of code is a function and the step of 
executing the section of code on the processor 
further comprises the step of executing the 
function on the processor. 

7. The method recited in claim 1, further compris- 
ing the step of tuning the section of code 
executed on the processor to decrease the 
number of instructions that are executed when 
the section of code is executed on the proces- 
sor. 

a The method recited in claim 1 wherein the 
data processing system is a distributed system 
that also includes a second processor and 
wherein the method further comprises the 
steps of: 

executing a second section of code on the 
second processor; 

using the facility in the kernel of the op- 
erating system to count the number of instruc- 
tions executed when executing the second 
section of code on the second processor; and 

reporting the count of instructions execut- 
ed when executing the second section of code 
to the user level monitoring program. 

9. The method recited in claim 8, further compris- 
ing the step of storing the count of the number 



of instructions executed when executing the 
second section of code on the second proces- 
sor in the storage mechanism. 

5 10. In a data processing system having a proces- 
sor that runs at a system level or a user level 
when executing instructions and a storage 
mechanism for storing an operating system 
that includes a kernel, a method comprising 

w the steps of: 

providing a facility in the kernel of the 
operating system for counting a number of 
times that a section of code has been called 
during execution of a program and a user level 

75 monitoring program for monitoring the instruc- 

tions counted by the facility; 

executing at least a portion of a program 
that includes a first section of code on the 
processor; 

20 using the facility in the kernel of the op- 

erating system to count a number of times that 
the first section of code is called during execu- 
tion of the portion of the program on the pro- 
cessor; and 

25 reporting the count to the user level moni- 

toring program. 

11. The method recited in claim 10 wherein the 
program is part of the kernel of the operating 

30 system. 

12. The method recited in claim 10 wherein the 
program is a user level program. 

35 13. The method recited in claim 10 wherein the 
program is a non-kernel portion of the operat- 
ing system. 

14. The method recited in claim 10, further com- 
40 prising the step of storing the count of the 

number of times that the first section of code 
is called during execution of the portion of the 
program in the storage mechanism. 

45 15. The method recited in claim 10 wherein the 
first section of code is a function and the step 
of using the facility in the kernel of the operat- 
ing system to count the number of times that 
the first section of code is called during execu- 

50 tion of the portion of the program further com- 

prises the step of using the facility in the 
kernel of the operating system to count the 
number of times that the function is called 
during execution of the portion of the program. 

55 

16. The method recited in claim 10, further com- 
prising the step of using the count of the 
number of times that the first section of code 
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is called during execution of the portion of the 
program to direct tuning of performance of the 
program. 

17. The method recited in claim 10 wherein the 
program includes a second section of code 
and wherein the method further comprises the 
step of using the facility in the kernel of the 
operating system to count a number of times 
that the second section of code is called dur- 
ing execution of the portion of the program on 
the processor and reporting the count of times 
that the second section of code is called to the 
user level monitoring program. 

18. The method recited in claim 10, further com- 
prising the step of storing the count of the 
number of items that the second section of 
code is called during execution of the portion 
of the program on the processor in the storage 
mechanism. 

19. The method recited in claim 10 wherein the 
data processing system is a distributed system 
that also includes a second processor and 
wherein the method further comprises the 
steps of: 

executing at least a portion of a second 
program that includes a section of code on the 



categorized by function of instructions execut- 
ed in the first function and the other functions 
when the first function is called during execu- 
tion of the program; and 
5 reporting the counts to the user level mon- 

itoring program. 

22. The method recited in claim 21 wherein the 
program is part of the kernel of the operating 
10 system. 

23b The method recited in claim 21 wherein the 
program is a user level program. 

75 24. The method recited in claim 21 wherein the 
program is a non-kernel portion of the operat- 
ing system. 

25. The method recited in claim 21, further com- 
20 prising the step of storing the separate counts 

in the storage mechanism. 

26. The method recited in claim 21, further com- 
prising the step of tuning performance of the 

25 program using the separate counts to direct 
the tuning. 

27. A system for measuring performance of a 
monitored program of instructions, comprising: 

a storage mechanism for storing a user 
level monitoring program and an operating 
system that includes a kernel, said kernel com- 
prising a facility for counting instructions that 
are executed; and 

a processor comprising: 

(i) a program execution unit for executing 
the monitored program and the user level 
monitoring program; 

(ii) a facility invoker for invoking the facility 
to count the number of instructions execut- 
ed by at least a portion of the monitored 
program; and 

(iii) a count reporter for reporting the counts 
gathered by the facility to the user level 
monitoring program. 

28. A system for measuring performance of a 
monitored program of instructions, said mon- 
itored program including a function, the system 
comprising; 

a storage mechanism for storing a user 
level monitoring program and an operating 
system that includes a kernel, said kernel com- 
prising a facility for counting a number of times 
that a function is called; 
a processor comprising: 
(i) a program execution unit for executing 
the monitored program and the user level 



processor; 30 

using the facility in the kernel of the op- 
erating system to count a number of times that 
the section of code of the second program is 
called during execution of the portion of the 
second program; and 35 

reporting the count of the number of times 
that the section of code of the second program 
is called to the user level monitoring program. 

20. The method recited in claim 19, further com- 40 
prising the step of storing the count of the 
number of times that the section of code of the 
second program is called during the execution 
of the portion of the second program in the 
storage mechanism. 45 

21. In a data processing system having a proces- 
sor capable of running at a system level and a 
user level for executing instructions and a stor- 
age mechanism for an operating system hav- 50 
ing a kernel, a method comprising the steps of: 

providing a facility in the kernel of the 
operating system for counting instructions ex- 
ecuted by the processor; 

executing a first function that calls other 55 
functions in the program on the processor; 

using the facility in the kernel of the op- 
erating system to provide separate counts 
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monitoring program; 

(ii) a facility invoker for invoking the facility 
to count the number of times that the func- 
tion in the monitored program is called dur- 
ing execution of the monitored program; 5 
and 

(Hi) a count reporter for reporting the count 
to the user level monitoring program. 
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